{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "# `list.sort` 方法和 `sorted` 函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "<code>list.sort</code>方法会**就地**排序列表，所以这个方法的返回值是<code>None</code>。\n",
    "\n",
    "**如果一个函数或者方法对对象进行的是就地改动，那它就应该返回<code>None</code>，让调用者知道传入的参数发生了变动，并且未产生新的对象。**\n",
    "\n",
    "与<code>list.sort</code>不同的是内置函数<code>sorted</code>，其会新建一个列表作为返回值。这个方法可以接收任何形式的可迭代对象作为参数，包括不可变序列或者生成器，不管其接收的是怎样的参数，它最后都返回一个列表。\n",
    "\n",
    "<code>list.sort</code>和<code>sorted</code>都有2个可选的关键字参数：\n",
    "\n",
    "- **`reverse`:** 如果设定为<code>True</code>，被排序的序列里的元素会以降序输出（把最大值当作最小值来排序）。此参数默认值是<code>False</code>。\n",
    "- **`key`:** 一个只有一个参数的函数，这个函数会被用在序列里的每一个元素上，所产生的结果将是排序算法依赖的对比关键字。这个参数的默认值是恒等函数，即默认用元素自己的值来排序。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "fruits = ['apple', 'banana', 'pear', 'raspberry', 'strawberry']\n",
    "print(sorted(fruits))\n",
    "print(sorted(fruits, reverse=True))\n",
    "print(sorted(fruits, key=len))\n",
    "print(sorted(fruits, key=len, reverse=True))\n",
    "print(fruits)\n",
    "fruits.sort()\n",
    "print(fruits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "## 通过某个关键字排序一个字典列表\n",
    "\n",
    "对于如下的字典列表，根据某个或某几个字典字段来排序这个列表：\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "rows = [\n",
    "    {'first_name': 'Brian', 'last_name': 'Jones', 'uid': 1003},\n",
    "    {'first_name': 'David', 'last_name': 'Beazley', 'uid': 1002},\n",
    "    {'first_name': 'John', 'last_name': 'Cleese', 'uid': 1001},\n",
    "    {'first_name': 'Big', 'last_name': 'Jones', 'uid': 1004}\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "可以使用<code>operator</code>模块的<code>itemgetter</code>函数。\n",
    "```\n",
    "itemgetter(item, ...) --> itemgetter object\n",
    "\n",
    "Return a callable object that fetches the given item(s) from its operand.\n",
    "After f = itemgetter(2), the call f(r) returns r[2].\n",
    "After g = itemgetter(2, 5, 3), the call g(r) returns (r[2], r[5], r[3])\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "from operator import itemgetter\n",
    "\n",
    "rows_by_fname = sorted(rows, key=itemgetter('first_name'))\n",
    "rows_by_uid = sorted(rows, key=itemgetter('uid'))\n",
    "rows_by_lfname = sorted(rows, key=itemgetter('last_name', 'first_name'))\n",
    "print(rows_by_fname)\n",
    "print(rows_by_uid)\n",
    "print(rows_by_lfname)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "<code>sorted</code>函数的<code>key</code>参数是 callable 类型，并且从列表中接受一个单一元素，然后返回被用来排序的值。\n",
    "<code>itemgetter()</code>函数就是负责创建这个 callable 对象的。\n",
    "\n",
    "<code>operator.itemgetter()</code>函数有一个被排序列表中的记录用来查找值的索引参数。\n",
    "可以是一个字典键名称，一个整形值或者任何能够传入一个对象的<code>\\_\\_getitem\\_\\_()</code>方法的值。\n",
    "如果你传入多个索引参数给<code>itemgetter()</code>，它生成的 callable 对象会返回一个包含\n",
    "所有元素值的元组，并且<code>sorted()</code>函数会根据这个元组中元素顺序去排序。\n",
    "想要同时在几个字段上面进行排序（比如通过姓和名来排序，也就是例子中的那样）的时候这种\n",
    "方法是很有用的。\n",
    "\n",
    "<code>itemgetter()</code>也可以使用lambda表达式代替，比如："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "rows_by_fname = sorted(rows, key=lambda r: r['first_name'])\n",
    "rows_by_lfname = sorted(rows, key=lambda r: (r['last_name'], r['first_name']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "# 使用 `bisect` 模块来管理已排序序列"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "已排序的序列可以用来进行快速搜索，标准库的<code>bisect</code>模块提供了二分查找算法。\n",
    "\n",
    "<code>bisect</code>模块包含两个主要函数，<code>bisect</code>和<code>insort</code>，\n",
    "两个函数都利用二分查找算法来在有序序列中查找或插入元素。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "## 用 `bisect` 来搜索\n",
    "\n",
    "<code>bisect(haystack, needle)</code>在<code>haystack</code>（干草垛）里搜索<code>needle</code>（针）的位置，该位置满足的条件是，把<code>needle</code>插入到这个位置后，<code>haystack</code>还能保持升序，即此函数返回的位置前面的值，都小于或等于<code>needle</code>的值。其中<code>haystack</code>必须是一个有序的序列。\n",
    "\n",
    "可以先用<code>bisect(haystack, needle)</code>查找位置<code>index</code>，再用<code>haystack.insert(index, needle)</code>来插入新值。或者用<code>insort</code>来一步到位，速度会更快一些。\n",
    "\n",
    "```python\n",
    "# bisect_demo.py\n",
    "\n",
    "import bisect\n",
    "import sys\n",
    "\n",
    "HAYSTACK = [1, 4, 5, 6, 8, 12, 15, 20, 21, 23, 23, 26, 29, 30]\n",
    "NEEDLES = [0, 1, 2, 5, 8, 10, 22, 23, 29, 30, 31]\n",
    "\n",
    "ROW_FMT = '{0:2d} @ {1:2d}    {2}{0:<2d}'\n",
    "\n",
    "\n",
    "def demo(bisect_fn):\n",
    "    for needle in reversed(NEEDLES):\n",
    "        position = bisect_fn(HAYSTACK, needle)\n",
    "        offset = position * '  |'\n",
    "        print(ROW_FMT.format(needle, position, offset))\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    if sys.argv[-1] == 'left':\n",
    "        bisect_fn = bisect.bisect_left\n",
    "    else:\n",
    "        bisect_fn = bisect.bisect\n",
    "\n",
    "    print('DEMO:', bisect_fn.__name__)\n",
    "    print('haystack ->', ' '.join('%2d' % n for n in HAYSTACK))\n",
    "    demo(bisect_fn)\n",
    "```\n",
    "\n",
    "```\n",
    "$ python3 bisect_demo.py\n",
    "DEMO: bisect\n",
    "haystack ->  1  4  5  6  8 12 15 20 21 23 23 26 29 30\n",
    "31 @ 14      |  |  |  |  |  |  |  |  |  |  |  |  |  |31\n",
    "30 @ 14      |  |  |  |  |  |  |  |  |  |  |  |  |  |30\n",
    "29 @ 13      |  |  |  |  |  |  |  |  |  |  |  |  |29\n",
    "23 @ 11      |  |  |  |  |  |  |  |  |  |  |23\n",
    "22 @  9      |  |  |  |  |  |  |  |  |22\n",
    "10 @  5      |  |  |  |  |10\n",
    " 8 @  5      |  |  |  |  |8\n",
    " 5 @  3      |  |  |5\n",
    " 2 @  1      |2\n",
    " 1 @  1      |1\n",
    " 0 @  0    0\n",
    "```\n",
    " \n",
    "```\n",
    "$ python3 bisect_demo.py left\n",
    "DEMO: bisect_left\n",
    "haystack ->  1  4  5  6  8 12 15 20 21 23 23 26 29 30\n",
    "31 @ 14      |  |  |  |  |  |  |  |  |  |  |  |  |  |31\n",
    "30 @ 13      |  |  |  |  |  |  |  |  |  |  |  |  |30\n",
    "29 @ 12      |  |  |  |  |  |  |  |  |  |  |  |29\n",
    "23 @  9      |  |  |  |  |  |  |  |  |23\n",
    "22 @  9      |  |  |  |  |  |  |  |  |22\n",
    "10 @  5      |  |  |  |  |10\n",
    " 8 @  4      |  |  |  |8\n",
    " 5 @  2      |  |5\n",
    " 2 @  1      |2\n",
    " 1 @  0    1\n",
    " 0 @  0    0\n",
    "```\n",
    "\n",
    "<code>bisect</code>的表现可以从两个方面来调整。\n",
    "\n",
    "1. 用它的两个可选参数——<code>lo</code>和<code>hi</code>——来缩小搜寻范围。<code>lo</code>的默认值是0，<code>hi</code>的默认值是序列的长度。\n",
    "2. <code>bisect</code>起始是<code>bisect_right</code>的别名，对应的函数是<code>bisect_left</code>。\n",
    "\n",
    "<code>bisect</code>可用来建立一个用数字作为索引的查询表格，比如把分数和成绩对应起来。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "import bisect\n",
    "\n",
    "def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):\n",
    "    i = bisect.bisect(breakpoints, score)\n",
    "    return grades[i]\n",
    "\n",
    "print([grade(score) for score in [33, 99, 77, 70, 89, 90, 100]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "## 用 `bisect.insort` 插入新元素\n",
    "\n",
    "<code>insort(seq, item)</code>把变量<code>item</code>插入到序列<code>seq</code>中，\n",
    "并能保持<code>seq</code>的升序顺序。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": false,
    "ein.hycell": false,
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "import bisect\n",
    "import random\n",
    "\n",
    "SIZE = 7\n",
    "\n",
    "random.seed(1730)\n",
    "\n",
    "my_list = []\n",
    "for i in range(SIZE):\n",
    "    new_item = random.randrange(SIZE*2)\n",
    "    bisect.insort(my_list, new_item)\n",
    "    print('%2d ->' % new_item, my_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": "worksheet-0",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "<code>insort</code>跟<code>bisect</code>一样，有<code>lo</code>和<code>hi</code>两个可选参数用来控制查找的范围。它也有个变体叫<code>insort_left</code>，这个变体在背后用的是<code>bisect_left</code>。\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "name": "14_sort.ipynb"
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
